AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Neural Information Processing SystemsFeb-16-2026, 20:31:25 GMT

9856b5d30ac61ab744fdab6f67d874e4-Paper-Conference.pdf

large language model, machine learning, natural language, (20 more...)

Country:

Asia > China (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceNov-11-2025

Tiny language models

Gross, Ronit D., Tzach, Yarden, Halevi, Tal, Koresh, Ella, Kanter, Ido

A prominent achievement of natural language processing (NLP) is its ability to understand and generate meaningful human language. This capability relies on complex feedforward transformer block architectures pre-trained on large language models (LLMs). However, LLM pre-training is currently feasible only for a few dominant companies due to the immense computational resources required, limiting broader research participation. This creates a critical need for more accessible alternatives. In this study, we explore whether tiny language models (TLMs) exhibit the same key qualitative features of LLMs. We demonstrate that TLMs exhibit a clear performance gap between pre-trained and non-pre-trained models across classification tasks, indicating the effectiveness of pre-training, even at a tiny scale. The performance gap increases with the size of the pre-training dataset and with greater overlap between tokens in the pre-training and classification datasets. Furthermore, the classification accuracy achieved by a pre-trained deep TLM architecture can be replicated through a soft committee of multiple, independently pre-trained shallow architectures, enabling low-latency TLMs without affecting classification accuracy. Our results are based on pre-training BERT-6 and variants of BERT-1 on subsets of the Wikipedia dataset and evaluating their performance on FewRel, AGNews, and DBPedia classification tasks. Future research on TLM is expected to further illuminate the mechanisms underlying NLP, especially given that its biologically inspired models suggest that TLMs may be sufficient for children or adolescents to develop language. The data and code that support the findings of this study are openly available on https://github.com/Rg32601/Tiny-Language-Models .

large language model, machine learning, tiny language model, (2 more...)

doi: 10.1016/j.physa.2025.131102

2507.14871

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Neural Information Processing SystemsOct-10-2025, 10:39:05 GMT

Cross-model Control: Improving Multiple Large Language Models in One-time Training

However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge.

delta model, language model, llm, (16 more...)

Country:

Asia > China (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceJul-15-2025

On The Role of Intentionality in Knowledge Representation: Analyzing Scene Context for Cognitive Agents with a Tiny Language Model

Burgess, Mark

Cognitive abilities, which include ideas like intentionality and consciousness, have long been viewed in Western philosophy as exclusive to the human realm. Intent is roundly considered justifiable only with minimum requirements for self-awareness or situational comprehension. However, such hard line views have softened gradually with modern enlightenment, and more of us are likely to accept that terms such as'agency', 'intelligence', and even'emotion' can apply for other species too. Even plants lean into sunlight in an intentional way; the identification of an intention doesn't have to arise from the plant to be true. Latterly their possibility has been extended even to artificial systems, which some find more acceptable, though a modern version of the privilege argument persists in a distinction between'simple' machinery and'complex' biology, which many believe still holds some principled leap in understanding. Ideological'blood-brain barriers', like these, continue to undermine efforts to form a rational causal explanation of intent, leading extremists to clutch at esoteric straws like quantum mechanics or complexity theory to account for perceived magic. In this note, I address another apparent schism that may shed light on these questions: the difference between process dynamics (the realm of physics) and interpretive semantics (the realm of linguistics and philosophy), and the suggestion that (deep down) intentionality might be a relatively simple phenomenon with an energetic explanation (as trust has been shown to be [9]). The recent acceptance of attention mechanisms in Large Language Models is related example [19, 22].

artificial intelligence, intentionality, natural language, (18 more...)

doi: 10.13140/RG.2.2.16152.92163/1

2507.1

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.48)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.82)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.50)

Neural Information Processing SystemsMay-27-2025, 09:57:14 GMT

Cross-model Control: Improving Multiple Large Language Models in One-time Training

The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards, such as instruction following or avoiding the output of sensitive information from the real world. However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge. To bridge this gap, we introduce Cross-model Control (CMC), a method that improves multiple LLMs in one-time training with a portable tiny language model. Specifically, we have observed that the logit shift before and after fine-tuning is remarkably similar across different models.

cross-model control, language model, tiny language model, (3 more...)

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.08)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Yang, Ke, Kindratenko, Volodymyr, Zhai, ChengXiang

TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment

arXiv.org Artificial IntelligenceDec-31-2024

Training language models (LMs) and their application agents is increasingly costly due to large datasets and models, making test failures difficult to bear. Simplified language environments serve as primordial training and testing grounds, retaining essential commonsense and communication skills but in a more digestible form, potentially enhancing the learning efficiency of LMs, and thus reducing the required model size and data volume for effective training and evaluation. In these simplified language environments, workable strategies for small models, datasets, and agents may be adaptable to larger models, datasets, and agents in complex language environments. To create such environments, we focus on two aspects: i) minimizing language dataset noise and complexity, and ii) preserving the essential text distribution characteristics. Unlike previous methods, we propose a pipeline to refine text data by eliminating noise, minimizing vocabulary, and maintaining genre-specific patterns (e.g., for books, conversation, code, etc.). Implementing this pipeline with large LMs, we have created a leaner suite of LM training and evaluation datasets: 71M Leaner-Pretrain, 7M Leaner-Instruct, Leaner-Glue for assessing linguistic proficiency, and Leaner-Eval for testing instruction-following ability. Our experiments show that leaner pre-training boosts LM learning efficiency. Tiny LMs trained on these datasets outperform those trained on original datasets in instruction-following across different language granularity levels. Moreover, the Leaner-Pretrain dataset's alignment with conventional large LM training sets enables resource-optimized analysis of how learning objectives, model architectures, and training techniques impact performance on language modeling and downstream tasks. Our code and datasets are available at https://github.com/EmpathYang/TinyHelen.git.

large language model, machine learning, natural language, (17 more...)

2501.00522

Country:

North America > United States (1.00)
Europe (1.00)
Oceania (0.67)

Genre:

Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

arXiv.org Artificial IntelligenceOct-23-2024

Cross-model Control: Improving Multiple Large Language Models in One-time Training

Wu, Jiayi, Sun, Hao, Cai, Hengyi, Su, Lixin, Wang, Shuaiqiang, Yin, Dawei, Li, Xiang, Gao, Ming

The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards, such as instruction following or avoiding the output of sensitive information from the real world. However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge. To bridge this gap, we introduce Cross-model Control (CMC), a method that improves multiple LLMs in one-time training with a portable tiny language model. Specifically, we have observed that the logit shift before and after fine-tuning is remarkably similar across different models. Based on this insight, we incorporate a tiny language model with a minimal number of parameters. By training alongside a frozen template LLM, the tiny model gains the capability to alter the logits output by the LLMs. To make this tiny language model applicable to models with different vocabularies, we propose a novel token mapping strategy named PM-MinED. We have conducted extensive experiments on instruction tuning and unlearning tasks, demonstrating the effectiveness of CMC. Our code is available at https://github.com/wujwyi/CMC.

artificial intelligence, large language model, natural language, (18 more...)

2410.17599

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
Asia > Singapore (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceFeb-6-2024

Rethinking Optimization and Architecture for Tiny Language Models

Tang, Yehui, Liu, Fangcheng, Ni, Yunsheng, Tian, Yuchuan, Bai, Zheyuan, Hu, Yi-Qi, Liu, Sichao, Jui, Shangling, Han, Kai, Wang, Yunhe

The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, \ie, neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$\pi$-1B Pro. Besides, PanGu-$\pi$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code is available at https://github.com/YuchuanTian/RethinkTinyLM.

language model, rethinking optimization and architecture, tiny language model, (11 more...)

2402.02791

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Austria (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)